All parts are not created equal: SIAM-LSA
نویسنده
چکیده
Previous research has shown the inadequacy of models for computing similarity that rely on any type of simple combination of features. Human similarity judgments are sensitive to the structure of the items being compared. For visual stimuli, the spatial arrangement of the items provides an obvious structure. For textual stimuli, however, the structure of the items must be inferred. Prior research on textual similarity has shown the dominant effect of relational features. We extend that research by looking at human judgments of the similarity of sentence pairs within the framework set out by Goldstone’s (1994) SIAM model, which calculates correspondences between objects and their features. We show that although the simple SIAM-based model fails to account well for the human judgments, a modified version which gives different weights to different semantic roles provides a strong match with human ratings.
منابع مشابه
Diffusive Limit of the Boltzmann Equation with Fluid Initial Layer in the Periodic Domain
We justify the global-in-time diffusive limit of the Boltzmann equation inside a periodic domain T. We only assume that in the initial expansion (1.2) at t = 0, the kinetic parts are well-prepared, but the fluid parts could be general, i.e. the fluids parts are not required to satisfy the incompressibility and Boussinesq relations. For this case, the fluid initial layers are created and preserv...
متن کاملLatent semantic sentence clustering for multi-document summarization
This thesis investigates the applicability of Latent Semantic Analysis (LSA) to sentence clustering for Multi-Document Summarization (MDS). In contrast to more shallow approaches like measuring similarity of sentences by word overlap in a traditional vector space model, LSA takes word usage patterns into account. So far LSA has been successfully applied to different Information Retrieval (IR) t...
متن کاملTowards Deeper Understanding of the LSA Performance
The paper presents on-going work towards deeper understanding of the factors influencing the performance of the Latent Semantic Analysis (LSA). Unlike previous attempts that concentrate on problems such as matrix elements weighting, space dimensionality selection, similarity measure etc., we primarily study the impact of another, often neglected, but fundamental element of LSA (and of any text ...
متن کاملNot All Words Are Created Equal: Extracting Semantic Orientation as a Function of Adjective Relevance
Semantic orientation (SO) for texts is often determined on the basis of the positive or negative polarity, or sentiment, found in the text. Polarity is typically extracted using the positive and negative words in the text, with a particular focus on adjectives, since they convey a high degree of opinion. Not all adjectives are created equal, however. Adjectives found in certain parts of the tex...
متن کاملHow few is too few? Determining the minimum acceptable number of LSA dimensions to visualise text cohesion with Lex
Building comprehensive language models using latent semantic analysis (LSA) requires substantial processing power. At the ideal parameters suggested in the literature (for an overview, see Bradford, 2008) it can take up to several hours, or even days, to complete. For linguistic researchers, this extensive processing time is inconvenient but tolerated— but when LSA is deployed in commercial sof...
متن کامل